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ltd processes are the most common form of continuous semi- 
martingales, and include diffusion processes. This paper is concerned 
with the nonparametric regression relationship between two such Ito 
processes. We are interested in the quadratic variation (integrated 
volatility) of the residual in this regression, over a unit of time (such 
as a day). A main conceptual finding is that this quadratic variation 
can be estimated almost as if the residual process were observed, 
the difference being that there is also a bias which is of the same 
asymptotic order as the mixed normal error term. 

The proposed methodology, "ANOVA for diffusions and Ito pro- 
cesses," can be used to measure the statistical quality of a parametric 
model and, nonparametrically, the appropriateness of a one-regressor 
model in general. On the other hand, it also helps quantify and char- 
acterize the trading (hedging) error in the case of financial applica- 
tions. 

1. Introduction. We consider the regression relationship between two 
stochastic processes Et and St, 

(1.1) dEt = ptdSt + dZt, 0<t< T, 

where Zt is a residual process. We suppose that the processes St and Et are 
observed at discrete sampling points = tQ < ■ ■ ■ <tk = T. With the advent 
of high frequency financial data, this type of regression has been a topic of 
growing interest in the literature; see Section 2.4. The processes Et and St 
will be Ito processes, which are the most commonly used type of continuous 
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semimartingale. Diffusions are a special case of Ito processes. Definitions are 
made precise in Section 2.1 below. The differential in (1.1) is that of an Ito 
stochastic integral, as defined in Chapter I.4.d of [33] or Chapter 3.2 of [34]. 

Our purpose is to assess nonparametrically what is the smallest possible 
residual sum of squares in this regression. Specifically, for two processes Xt 
and Yt, denote the quadratic covariation between Xt and Yj on the interval 
[0,T] by 

(1.2) {X,Y)t = hm , V(Xi,+, - XtJiYt^^, - Y^J, 

where = to < ■ ■ ■ < tk = T . (This object exists by Definition 1.4.45 or The- 
orem 1.4.47 in [33], pages 51-52, and similar statements in [34] and [38].) In 
particular, {Z, Z)t — the quadratic variation of Zt — is the sum of squares of 
the increments of the process Z under the idealized condition of continuous 
observation. We wish to estimate, from discrete-time data, 

(1.3) mm{Z,Z)T, 

where the minimum is over all adapted regression processes p. 

An important motivating application for the system (1.1) is that of sta- 
tistical risk management in financial markets. We suppose that St and 
are the discounted values of two securities. At each time t, a financial insti- 
tution is short one unit of the security represented by H, and at the same 
time seeks to offset as much risk as possible by holding pt units of security 
5. Zt, as given by (1.1), is then the gain/loss up to time t from following 
this "risk-neutral" procedure. In a complete (idealized) financial market, 
mmp{Z, Z) is zero; in an incomplete market, mmp{Z, Z) quantifies the un- 
hedgeable part of the variation in asset H, when one adopts the best possible 
strategy using only asset S. And this lower bound (1.3) is the target that a 
risk management group wants to monitor and control. 

The statistical importance of (1.3) is this. Once you know how to estimate 

(1.3) , you know how to assess the goodness of fit of any given estimation 
method for pt- You also know more about the appropriateness of a one- 
regressor model of the form (1.1). We return to the goodness of fit questions 
in Section 4. A model example of both statistical and financial importance 
is given in Section 2.2. 

To discuss the problem of estimating (1.3), consider first how one would 
find this quantity if the processes H and S were continuously observed. Note 
that from (1.1), one can write 

(1.4) {Z, Z)t = {E, E)t + r pI d{S, S).u-2 f pu d{E, 5)„ 

Jo Jo 

([33], 1.4.54, page 55). Here d{X,Y)t is the differential of the process {X,Y)t 
with respect to time. We shall typically assume that {X,Y)t is absolutely 
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continuous as a function of time (for any realization). It is easy to see that 
the solution in pt to the problem (1.3) is uniquely given by 

^ ' ' ^* d{S,S)t {S,S)[' 

where (H, S)[ is the derivative of (H, S)t with respect to time. Apart from its 
statistical significance, in financial terms p is the hedging strategy associated 
with the minimal martingale measure (see, e.g., [16] and [41]). 

The problem (1.3) then connects to an ANOVA, as follows. Let Zt be 
the residual in (1.1) for the optimal pt, so that the quantity in (1.3) can be 
written simply as {Z,Z)t- In analogy with regular regression, substituting 

(1.5) into (1.4) gives rise to an ANOVA decomposition of the form 

(1.6) = 1^ pI d{S, 5)„ + {Z^, 

total SS ^ ' RSS 

SS explained 

where "SS" is the abbreviation for (continuous) "sum of squares," and "RSS" 
stands for "residual sum of squares." Under continuous observation, there- 
fore, one solves the problem (1.3) by using the p and Z defined above. Our 
target of inference, {Z, Z)t, would then be observable. Discreteness of obser- 
vation, however, creates the need for inference. 

The main theorems in the current paper are concerned with the asymp- 
totic behavior of the estimated RSS, as more discrete observations are avail- 
able within a fixed_time window. There will be some choice in howto_ select 
the estimator {Z,Z)t- We consider a class of such estimators {Z,Z)^. No 
matter which of our estimators is used, we get the decomposition 

(1.7) {Z^), - {Z, Z)t « biast + ([Z, Z]t - (Z, Z)t) 

to first-order asymptotically, where [Z, Z] is the sum of squares of the incre- 
ments of the (unseen) process Z at the sampling points, [Z, Z]t = J2ii^ti+i — 
ZtJ'^; see the definition (2.4) below. 

A primary conceptual finding in (1.7) is the clear cut effect of the two 
sources behind the asymptotics. The form of the bias depends only on the 
choice of the estimator for the quadratic variation. On the other hand, the 
variation component is common for all the estimators under study; it comes 
only from the discretization error in discrete time sampling. 

It is worthwhile to point out that our problem is only tangentially re- 
lated to that of estimating the regression coefficient pt- This is in the sense 
that the asymptotic behavior of nonparametric estimators of pt does not 
directly imply anything about the behavior of estimators of {Z,Z)t- To il- 
lustrate this point, note that the convergence rates are not the same for the 
two types of estimators [Op(n~^/^) vs. Op(?i~^/^)], and that the variance of 
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the estimator we use for pt becomes the bias in one of our estimators of 
(Z, Z)t [compare equation (2.9) in Section 2.4 with Remark 1 in Section 3]. 
For further discussion, see Section 2.4. Depending on the goal of inference, 
statistical estimates pt of the regression coefficient can be obtained using 
parametric methods, or nonparametric ones that are either local in space 
or in time, as discussed in Sections 2.4 and 4.1 and the references cited in 
these sections. In addition, it is also common in financial contexts to use 
calibration ("implied quantities," see Chapters 11 and 17 in [28]). 

The organization is as follows: in Section 2 we establish the framework 
for ANOVA, and we introduce a class of estimators of the residual quadratic 
variation. Our main results, in Section 3, provide the distributional proper- 
ties of the estimation errors for RSS. See Theorems 1 and 2. In Section 4 
we discuss the statistical application of the main theorems. Parametric and 
nonparametric estimation are compared in the context of residual analysis. 
The goodness of fit of a model is addressed. Broad issues, including the 
analysis of variation versus analysis of variance, the moderate level of ag- 
gregation versus long run, the actual probability distribution versus the risk 
neutral probability distribution in the derivative valuation setting, are dis- 
cussed in Sections 4.4 and 4.5. After concluding in Section 5, we give proofs 
in Sections 6 and 7. 

2. ANOVA for Ito processes: framework. 

2.1. ltd processes^ quadratic variation and diffusions. The assumptions 
and definitions in the following two subsections are used throughout the 
paper, sometimes without further reference. First of all, we shall be working 
with a fixed filtered probability space. 

System Assumption I. We suppose that there is an underlying filtered 
probability space {Q,J^,J^t,P)o<t<T satisfying the "usual conditions" (see, 
e.g.. Definition 1.3, page 2, of [33], also in [34]). 

We shall then be working with Ito processes adapted to this system, as 
follows. Note that Markov diffusions are a special case. 

Definition 1 (Ito processes). By saying that X is an ltd process, we 
mean that X is continuous (a.s.), (J-t)-adapted, and that it can be repre- 
sented as a smooth process plus a local martingale. 



where TV is a standard ((.Ft), P)-Brownian motion, Xq is J-q measurable, 
and the coefficients Xt and are predictable, with /q \Xu \ du < +co and 



(2.1) 
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So i^u)'^ du < +00. We also write 

(2.2) Xt = Xo + X^^ + Xf^^ 

as shorthand for the drift and local martingale parts of Doob-Meyer decom- 
position in (2.1). 



A more abstract way of putting this definition is that Xt is an Ito process 
if it is a continuous semimartingale ([33], Definition 1.4.21, page 42) whose 
finite variation and local martingale parts, given by (2.2), satisfy that both 
XP^ and the quadratic variation are absolutely continuous. 

Obviously, an Ito process is a special semimartingale, also in the sense of 
the same definition of [33]. 

Diffusions are normally taken to be a special case of Ito processes, where 
one can write (c^)^ = a{Xt,t) and Xt = b{Xt,t), and similarly in the mul- 
tidimensional setting. For a description of the link, we refer to Chapter 5.1 
of [34]. 

The Ito process definition extends to a two- or multi-dimensional process, 
say, {Xt,Yt), by requiring each component Xt and individually to be an 
Ito process. Obviously, is typically different for different Ito processes 
X. For two processes X and Y, the relationship between and can 
be arbitrary. 

The quadratic variation {X,X)t [formula (1.2)] can now be expressed in 
terms of the representation (2.1) by ([33], 1.4.54, page 55) 

{X,X)t= fia^fdu. 
Jo 

We denote by {X, X)[ the derivative of {X, X)t with respect to time t. Then 
{X,X)[ = (o"^)^, and this quantity (or its square root) is often known as 
volatility in the finance literature. 

Both quadratic variation and covariation are absolutely continuous. This 
follows from the Ito process assumption and from the Kunita-Watanabe 
inequality (see, e.g., page 69 of [38]). 



Definition 2 (Volatility as an Ito process). Denote by {X,Y)'t the 
derivative of {X,Y)t with respect to time. We shall often suppose that 
(X, Y)'t is itself an Ito process. For ease of notation, we then write its Doob- 
Meyer decomposition as 

d{X, Y)'t = dD^^ + dRf^ = 5f ^ dt + dRf^. 

Note that the quadratic variation of {X, Y)' is the same as {R^^ ,R^^), and 
that, in our notation above, D-^^ = ((X,y)')°^ and R^^ = ((X,y)')^^. 
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2.2. Model example: Adequacy of the one factor interest rate model. A 
common model for the risk free short term interest rate is given by the 
diffusion 

(2.3) drt = ij{rt)dt + j{rt)dWt, 

where Wt is a Brownian motion. For example, in [43], one takes //(r) = 
K{a — r), and 7(r) = 7 = constant, while Cox, Ingersoll and Ross [6] uses 
the same function /i, but 7(r) =7r^/^. For more such models, and a brief 
financial introduction, see, for example, [28]. A discussion and review of 
estimation methods is given by Fan [14]. 

One of the implications of this so-called one factor model is the follow- 
ing. Suppose St and Ef are the values of two zero coupon government bonds 
with different maturities. Financial theory then predicts that, until maturity, 
St = f{rt,t) and = g{rt,t) for two functions / and g (see [28], Chapter 
21, for details and functional forms). Under this model, therefore, the rela- 
tionship dHj = pt dSt holds from time zero until the maturity of the shorter 
term bond. It is easy to see that d{S, S)t = fr{'<'t^'t)'^l{f'tY dt and d{E, S)t = 
f l.{rt,t)g',irt,t)j{rt)^dt, with pt = d{E,S)t/d{S,S)t = g'rirt,t)/fi.{rt,t). Here 
f^ is the derivative of / with respect to r, and similarly for g'^,. 

The one factor model is only an approximation, and to assess the adequacy 
of the model, one would now wish to estimate {Z,Z)t over different time 
intervals. This provides insight into whether it is worthwhile to use a one- 
factor model at all. If the conclusion is satisfactory, one can estimate the 
quantites p and 7 (and, hence, / and g) with parametric or nonparametric 
methods (see Sections 2.4 and 4.1), and again, use our methods to assess 
the fit of the specific model, for example, as discussed in Section 4.1. 

2.3. Finitely many data points. We now suppose that we observe pro- 
cesses, in particular, St and H^, on a finite set (partition, grid) ^ = {0 = 
to < ti < ■ ■ ■ < tk = T} time points in the interval [0, T] . We take the time 
points to be nonrandom, but possibly irregularly spaced. Note that this also 
covers the case where the tj are random but independent of the processes 
we seek to observe, so that one can condition on Q to get back to irregular 
but nonrandom spacing. 

Definition 3 (Observed quadratic variation). For two Ito processes X 
and Y observed on a grid Q, 

(2.4) [X,Y]f= {AXt^){AYt^), 

ti+l<t 

where AXt- = Xt^^-^ ~ ^u- When there is no ambiguity, we use [^, 5^]t for 
[X, y]f , or [X, y]|"^ in case of a sequence Gn- 
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Note that this is not the same as the usual definition of [X, Y]t for Ito pro- 
cesses. We use {X,Y)t to refer to a continuous process, while [X,Y]t refers 
to a (cadlag) process which only changes values at the partition points ti. 

Since the results of this paper rely on asymptotics, we shall take limits 
with the help of a sequence of partitions ^„ = {0 = < < • • • < t^"^ = 
T}. As +00, we let Qn become dense in [0,T], in the sense that the 
mesh 

(2.5) =max|A4"^| ^0. 

Here At^"^ = t^^-^ — . In other words, the mesh is the maximum distance 

between the 4"^'^- other hand, T remains fixed (except briefly in 

Section 4.5). 

In this case, [-^,5^] = [X, y]^" converges to {X,Y) uniformly in proba- 
bility; see [33], Theorem 1.4.47, page 52, and [38], Theorem 11.23, page 68. 
More is true; see Section 5 of [32], and (our) Sections 2.8 and 6 below. 

Note that, under (2.5), kn = \Qn\ — > 00. It is often convenient to consider 
the average distance between successive observation points, 

(2.6) ^^"^ = f; 

see Assumption A(i) below. 

2.4. The regression problem, and the estimation of pt- The processes in 
(1.1) will be taken to satisfy the following. 

System Assumption II. We let E and S be Ito processes. We assume 
that 

(2.7) ^ t/*^' ^^'^ ^ almost surely. 

This assumption assures that pt, given by (1.5), is well defined under (2.7) 
by the Kunita-Watanabe inequality. 

As noted in the Introduction, under continuous observation of and St, 
one can also directly observe the optimal pt and Zt- Our target of inference, 
{Z, Z)t, would then be observable. Discreteness of observation, however, cre- 
ates the need for inference. 

In a noncontinuous world, where H and 5 can only be observed over grid 
times, the most straightforward estimator of p is 

(2.8) Pt - 



(S^'t [S,S]t-[S,S]t_h„ 
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For simplicity, this estimator is the one we shah use in the following. The re- 
sults easily generalize when more general kernels are used. Note that we have 
to use a smoothing bandwidth /i„ . There will naturally be a tradeoff between 

hn and At^'^\ As we now argue, this typically results in /i„ = ©((At^"^)-^/^). 

Asymptotics for estimators of the form {X, Y)^ = {[X, Y]t — [X, Y]t-h„)/hn 
and, hence, for pt, are given by Foster and Nelson [17] and Zhang [44]. Let 

(n) 

At be the average observation interval, assumed to converge to zero. If 
{X,Y)[ is an Ito process with nonvanishing volatility, then it is optimal to 

take K = 0((At^"V^^)> and (At^"V/^((X^J - {X,Y)[) converges in law 
(for each fixed t) to a (conditional on the data) normal distribution with 
mean zero and random variance. (The mode of convergence is the same as 
in Proposition 1.) The asymptotic distributions are (conditionally) indepen- 
dent for different times t. If {X,Y)[ is smooth, on the other hand, the rate 

becomes (At''"'*)^/^ rather than (At^"^)^/^, and the asymptotic distribution 
contains both bias and variance. 

The same applies to the estimator pt- In the case when S and S have a 
diffusion component, the estimator has (random) asymptotic variance 

v._,„).(^^+,^.,„(|||_,?) 

whenever /i„/(At ^"^)^/^ — > c G (0, oo); see [44]. H{t) is defined in Section 2.6 
below. 

The scheme given in (2.8) is only one of many for estimating {X, Y)[ by us- 
ing methods that are local in time. In particular, Genon-Catalot, Laredo and 
Picard [21] use wavelets for this purpose and determine rates of convergence 
and limit distributions under the assumption that (X, y)^ is deterministic 
and has smoothness properties. 

Other important literature in this area seeks to estimate {X, Y)[ as a func- 
tion of the underlying state variables by methods that are local in space; 
see, in particular, [15, 26, 31]. The typical setup is that U = {X,Y, . . .) is a 
Markov process, so that {X,Y)^ = f{Ut) for some function /, and the prob- 
lem is to estimate /. If all coefficients in the Markov diffusion are smooth of 
order s, and subject to regularity conditions, the function / can be estimated 
with a rate of convergence of (At^"'^y^^^^'^^\ 

The convergence obtained for the estimator of / under Markov assump- 
tions is considerably faster than what can be obtained for (2.8). It does, 
however, rely on stronger (Markov) assumptions than the ones (ltd pro- 
cesses) that we shall be working with in this paper. Since we shall only 
be interested in pt as a (random) function of time, our development does 
not require a Markov specification and, in particular, does not require full 
knowledge of what potential state variables might be. 
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Of course, this is just a subset of the hterature for estimation of Markov 
diffusions. See Section 4.1 for further references. 

We emphasize that the general ANOVA approach in this paper can be 
carried out with other schemes for estimating pt than the one given in (2.8). 
We have seen this as a matter of fine tuning and, hence, beyond the scope 
of this paper. This is because Theorems 1 and 2 achieve the same rate of 
convergence as the one obtained in Proposition 1. 

2.5. Estimation schemes for the residual quadratic variation {Z,Z)t. We 
now return to the estimation of the quadratic variation {Z, Z) of residuals. 
Given the discrete data of (5,5), there are different methods to estimate 
the residual variation. 

One scheme is to start with model (1.1). For a fixed grid Q, one first 
estimates AZt- through the relation AZt- = AHt^ — pt^{ASt^), where all in- 
crements are from time to t^+i, and then obtains the quadratic variation 
(q.v. hereafter) of Z. This gives an estimator of {Z, Z) as 

(2.10) [Z,Z]t= i^Ztf= [AEt^-PuiASu)f, 

where the notation of square brackets (discrete time-scale q.v.) is invoked, 
since AZf. is the increment over discrete times. 

Alternatively, one can directly analyze the ANOVA version (1.6) of the 
model, where d{Z, Z)t = 'r)t — pi d{S, S)t- This yields a second estimator 
of {Z,Z)t, 

(2.11) {Zy)f^= E KAEtf - pliAStf]. 

ti+i<t 

In general, any convex combination of these two, 

(2.12) {Z^)T = (1 - o^)[Z,Z]t + a('i;Z)f \ 

would seem like a reasonable way to estimate {Z^Z)t^ and this is the class 
of estimators that we shall consider. Particular properties will be seen to 

-(1/2) — ^ 

attach to {Z, Z)^ , which we shall also denote by {Z, Z)t. For a start, it is 
easy to see that 

(2.13) (Z^t = [H,Z]t. 

Note that (2.13) also has a direct motivation from the continuous model. 
Since (5, Z)t = 0, (1.1) yields that (H, Z)t = {Z, Z)t. 

-(q) 

We establish the statistical properties of the estimator (Z, Z)^ and, in 

particular, those of {[Z,Z] and {Z^Z)t) in Section 3. Asymptotic properties 
are naturally studied with the help of small interval asymptotics. 
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2.6. Paradigm for asymptotic operations. The asymptotic property of 
the estimation error is considered under the fohowing paradigm. 

Assumption A (Quadratic variation of time). For each n£ N, we have 
a sequence of nonrandom partitions Qn = {^1"^}; Atj"^ = t^^^ — t^^\ Let 
maxj(Atj-"^) =5("). Suppose that: 

(i) ^ as n ^ cx), and (5^") /At^"^ = 0(1). 

(ii) H^n){t)= '+^_(„) ^/^(t) as n^cx). 

(iii) H{t) is continuously differentiable. 

(iv) The bandwidth /i„ satisfies c, where < c < oo. 

(v) — — /in)]/^n — > H'{t), where the convergence is uni- 
form in t. 

When the partitions are evenly spaced, H{t) = t and H'{t) = 1. In the 
more general case, the left-hand side of (ii) is bounded by t^^^*^) /At^"\ while 
the left-hand side of (v) is bounded by ^^^/(At^")/!) + 5(n)/Ai^"). in all 

our results, h is eventually bigger than At and, hence, both the left- 
hand sides are bounded because of (i). The assumptions in (ii) and (v) are, 
therefore, about a unique limit point, and about interchanging limits and 
differentiation. 

Note that we are not assuming that the grids are nested. Also, as discussed 
in Section 2.4, how fast hn and At , respectively, decay has a trade-off in 
terms of the asymptotic variance of the estimation error in p. It is optimal 

to take hn = 0{\J~A^), whence Assumption A(iv). From now on, we use h 
and hn interchangeably. 

2.7. Assumptions on the process structure. The following assumptions 
are frequently imposed on the relevant Ito processes. 

Assumption B(X) {Smoothness). X is an Ito process. Also, {X,X)[ 
and Xt are continuous almost surely. 

The addition of Assumption B to an Ito process X, and similar smooth- 
ness assumptions in results below, is partially due to the estimation of p, 
which requires stronger smoothness conditions. In some instances. Assump- 
tion B is partially also a matter of convenience in a proof and can be dropped 
at the cost of more involved technical arguments. 
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2.8. The limit for the discretization error. The error {Z,Z)^ — {Z,Z)t 
can be decomposed into bias and pure discretization error [Z,Z]^ — {Z,Z)^. 
We here discuss the hmit result for the latter, following [32]. We first need 
the following. 

System Assumption III {Description of the filtration). There is a con- 
tinuous multidimensional P- local martingale X = . . . , X^P^), for some 
finite p, so that J^t is the smallest sigma-field containing a{Xs,s < t) and M, 
where M contains all the null sets in a{Xs, s <T). 

The final statement in the assumption assures that the "usual conditions" 
([33], page 2, [34], page 10) are satisfied. The main implication, however, is 
on our mode of convergence, as follows. 

Proposition 1 (Discretization theorem). Let Z be an ltd process for 
which /q {{Z, Z)')f dt < oo a.s. and /q Zf dt < co a.s. Subject to Assump- 
tions A(i)-(ii) and System Assumptions I and III, 

(Ai^"V^/2([Z,Z]J") - ''■^'^ j'^ ^2H'{n){Z,Zy,dWu, 

where W is a standard Brownian motion, independent of the underlying data 
process X . 

The symbol '^•i^!^'*^ denotes stable convergence of the process, as defined 
in [39] and [1]; see also [40] and Section 2 of [32]. 

In the case of an equidistant grid, the result coincides with the applicable 
part of Theorems 5.1 and 5.5 in [32], and the proof is essentially the same 
(see Section 6). In abstract form, results of this type appear to go back 
to [40]. The Jacod and Protter result was used in financial applications by 
Zhang [44] and Barndorff-Nielsen and Shephard [3]. The case where Zt is 
observed discretely and with additive error is considered in [45] and [46]. 

Note that the conditions on (Z, Z)' and Z are the same as in the equidis- 
tant case, due to the Lipschitz continuity of H. Some further discussion and 
results are contained in Section 6. 

3. ANOVA for diffusions: main distributional results. 

3.1. Distribution of [Z , Z]t — {Z, Z)t. Recall that the square bracket [Z, Z] 
and the angled bracket (Z, Z) represent the quadratic variation of Z at dis- 
crete and continuous time-scale, respectively. 



12 



P. A. MYKLAND AND L. ZHANG 



Theorem 1. Under System Assumptions I-II [and, in particular, equa- 
tion (1.1)], assume that Assumption A holds. Suppose that S, H, p, {S,S)', 
{E,Sy, {R'^^,R'^^y, {R^'^,R^^y, and {R-^,R^^y are ltd processes, each 
satisfying Assumption B. Let the estimator [Z, Z]t be defined as in (2.10). 
Then, as n — > oo, 

^Ai^^^yy\[Z,Z]t-{Z,Z)t) 

(3.1) 

= A + ( At ^"V^/2([Z, Zp - {Z, Z)t) + 0,(1) 
uniformly in t, where 

(3.2) Dt = ^ f\p,py^d{S,S)u + c f H'{u)d{Z,Z)u. 

6c Jo Jo 

Remark 1. The consequence of Theorem 1 is that the quantity in (3.1) 
converges in law (stably) to 

Dt+ jy2H'{u){Z,zy^dWu; 

the Op(l) term goes away by Lemma VI. 3. 31, page 352 in [33]. 

Note that Dt in (3.2) can be expressed as Dt = /q Vp^p{u) d{S, S)u, where 
Vp^p{t) is the asymptotic variance of pt — Pu see (2.9) or [44]. Hence, the 
(random) variance term for p becomes a bias term for \Z, Z] . This is intu- 
itively natural since the pt are asymptotically independent for different t. 

Theorem 1, together with Proposition 1, says that the estimator [Z,Z]t 
converges to {Z, Z)t at the order of the square root of the average sampling 
interval. In the limit the error term consists of a nonnegative bias D^, due to 
the estimation uncertainty [Z, Z] — [Z, Z] , and a mixture Gaussian, due to 
the discretization [Z,Z]f — {Z,Z)f. The nonnegativeness of the asymptotic 
bias occurs because the q.v.'s {{p,p), {S,S), {Z,Z)) are nondecreasing pro- 
cesses. Furthermore, (3.2) displays a bias-bias tradeoff; thus, an optimal c 
for smoothing can be reached to minimize the asymptotic bias, though we 
have not investigated the effect of having a random c. The discretization 
term is independent of the smoothing factor. 

3.2. Distribution of {Z,Z)t — {Z,Z)t. 

Theorem 2. Under System Assumptions I-II, assume that Assump- 
tion A holds. Also assume each of the following processes exists, and is an 
Ro-process satisfying Assumption B.- E, S, p, {E,S)', (5,5)', {R^'^ , R^^)' , 
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{R^^ , R^'^)' and {R^^ , R^^)' . Also suppose that the processes (H,p)' and 
{S, p)' are continuous. Then, uniformly in t, 

(3.3) =^j\E,Sy,dp^ 

+ iAi^''Y'^\[Z,Zp - {Z,Z)t) + Opil). 

Remark 1 applies similarly. 

Unlike [Z,Z], the asymptotic (conditional) bias associated with {Z,Z)t 
does not necessarily have a positive or negative sign. Moreover, we are no 
longer faced with a bias-bias tradeoff due to the position of c in (3.3). In 
this case the role of smoothing in the asymptotic bias will be discussed in 
Section 3.3. 

(a) 

3.3. General results for the {Z,Z)^ class of estimators. From (2.12), 

^ = (1 - 2a) [Z, Z]t + 2a{Z^)t, 
and it follows from the assumptions of Theorems 1 and 2 that, if one sets 

(3.4) bias^") = - ['{E, S)'^ dpu + (1 - 2a) A, 

c Jo 

then, as in Remark 1, 

{Kt^''^)-"\{ZX)T -{z,z),) 

(3.5) =biasl")+(Ai^"Vi/2([Z,Z]l") - +o,(l) 
''■^'^ biasl"^ + ^2H'{u){Z, Z)'^ dWu. 

In summary, for any linear combination of the estimators in Theorems 
1 and 2, a G [0, 1], the convergence in (3.5) is in law as a process, and the 
limiting Brownian motion W is independent of the entire data process. For 
details of stable convergence, see the discussion and references in Section 2.8 
above. 

The "variance" term (At - {Z,Z)t) is the same for any 
estimator in the linear-combination class, and they are all asymptotically 
perfectly correlated. The common asymptotic, conditional variance is in- 
dependent of the smoothing bandwidth. It remains unclear whether the 
common asymptotic variance could, perhaps, be a lower bound under the 
nonparametric setting (see [5] for a comprehensive discussion). This needs 
further investigation. 
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Table 1 

The effect of constant p on the bias components 



Estimator Asymptotic bias 



[Z,Z]t 


c J^H'{u)d{Z,Z)^ 












For the bias, on the other hand, the estimation procedure plays an impor- 
tant role, as the bias term varies with a. Also the smoothing effect enters 
the bias terms. Prom Theorems 1 and 2, excessive over-smoothing (smaller 

^(a) 

c) or under-smoothing (bigger c) can explode the bias of {Z, Z) , for -^i 
thus (conditional) bias may be minimized optimally. When a = it is not 
obvious how to deal with bias-bias tradeoff. One might theoretically be able 

— -(1/2) 

to reduce the bias for {Z, Z) [i.e., {Z, Z) ] by choosing the smallest possi- 
ble bandwidth h. This thought should, however, be taken with caution. It is 
not obvious whether the magnitude of the higher-order terms in the earlier 
results would remain negligible if the estimation window h were to decrease 
faster than the order V^Ai. 

Table 1 shows that assuming constant p, (Z, Z)i will be the best choice 
among the three. When p is random, none of the estimation schemes in 
Section 2.5 is obviously superior to the others. 

3.4. Estimating the asymptotic distribution. For statistical inference con- 
cerning {Z, Z), one needs, in view of the above, to estimate the asymptotic 
(random) bias and variance. The bias term can be obtained by substitu- 
tion of estimated quantities into the relevant expressions, most generally 
(3.4). We shall here be concerned with the variance term in (3.5). In view 
of the stable convergence in Proposition 1, we therefore seek to estimate 
J^2H'iu){{Z,Zyj'du. 

By a modification of Barndorff-Nielsen and Shephard [3], and also 
Mykland [37], one can do this by considering the fourth-order variation. 

Definition 4 (Observed fourth-order variation). For an Ito processes 
X observed on a grid Q, 

(3.6) [X,X,X,X]t= Yl i^Xtf- 

ti + l<t 

Proposition 2 (Estimation of variance). Assume the regularity con- 
ditions of Theorem 1, and let Z be defined as in that result. Also assume 
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System Assumption III. Then, as n— >cx), 

(3.7) |(At("Vi[Z,Z,Z,Z],^ f\H'{u){{Z,Z)'Sdu 

Jo 

uniformly in probability. 

This estimate of variance can be used in connection with Sections 4.2 and 4.3 
below. 

The proof is given in Section 6. It can be noted from there that the same 
statement (3.7) would hold under weaker conditions if Z were replaced by 
Z, as follows. 

Remark 2. Assume the conditions of Proposition 1, and also that \Z\ 
and {Z, Z)' are bounded a.s. Then, as n — > oo, 

(3.8) l{Ai^'''>r'[Z,Z,Z,Z]t^ r2H'{u){{Z,Zyj'du 

Jo 

uniformly in probability. 

This generalizes the corresponding result at the end of page 270 in [3]. 
The finding in [37] is exact for small samples in the context of explicit 
embedding, where it follows from Bartlett identities. For another use of this 
methodology, see, for example, the proof of Lemma 1 in [35]. 

4. Goodness of fit. The purpose of ANOVA is to assess the goodness 
of fit of a regression model on the form (1.1). We here illustrate the use of 
Theorems 1 and 2 by considering two different questions of this type. In the 
first section we discuss how to assess the fit of a parametric estimator for p. 
Afterward, we focus on the issue of how good is the one regressor model 
itself, independently of estimation techniques. This is already measured by 
the quantity {Z,Z)t, but can be further studied by considering confidence 
bands for {Z, Z)t as a process, and by an analogue to the coefficient of deter- 
mination. Finally, we discuss the question of the relationship between this 
ANOVA and the analysis of variance that is used in the standard regression 
setting. 

4.1. The assessment of parametric models. In the following we suppose 
that a parametric model is fit to the data, and p is estimated as a function of 
the parameter. Parametric estimation of discretely observed diffusions has 
been studied by Genon-Catalot and Jacod [18], Genon-Catalot, Jeantheau 
and Laredo [19, 20], Gloter [22], Gloter and Jacod [23], Barndorff-Nielsen 
and Shephard [2], Bibby, Jacobsen and S0rensen [4], Elerian, Siddhartha and 
Shephard [13], Jacobsen [29], S0rensen [42] and Hoffmann [27]. This is, of 
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course, only a small sample of the literature available. Also, these references 
only concern the type of asymptotics considered in this paper, where [0,T] 
is fixed and At — > 0, and there is also a substantial literature on the case 
where At is fixed and T ^ oo. 

-{") 

In Section 3 we have studied the nonparametric estimate {Z,Z)j, for 

-(«) 

the residual sum of squares. We here compare {Z,Z)rp to its parametric 
counterpart to see how good the parametric model is in capturing the true 
regression of H on S. 

Specifically, we suppose that data from the multidimensional process Xt 
is observed at the grid points. Xt has among its components at least St and 
Et, and possibly also other processes. The parametric model is of the form 
-Pe,?/)) 9 € Q, ip € ^ , where the modeling is such that diffusion coefficients are 
functions of 9, while drift coefficients can be functions of both 6 and ip. It 
is thus reasonable to suppose that as At — > 0, 9 converges in probability to 
a nonrandom parameter value and that 

(Ai)-'/\9-9o)^r^NiO,l) 

in law stably, where r/ is a function of the data and the A^(0, 1) term is 
independent of the data. (For conditions under which this occurs, consult, 
e.g., the references cited above.) Bq is the true value of the parameter if the 
model does contain the true probability, but is otherwise also taken to be a 
defined parameter. 

Under Pe^^i,, the regression coefficient pt is of the form (3t{9)- Most com- 
monly, f3t{9) = h[Xt]9) for a nonrandom functional 6. 

We now ask whether the true regression coefficient can be correctly es- 
timated with the model at hand. In other words, we wish to test the null 
hypothesis Hq that Pt{9o) = pt- 

For the AN OVA analysis, define the theoretical residual by 

dVt = dEt-/3t{9o)dSt, Vo = 0, 

and the observed one by 

A% = AEt^-PtM^St,, Vo = 0. 

Under the null hypothesis {V, V) = {Z, Z), and so a natural test statistic is 
of the form 

U = (Ai)-y\[V,V]T-{Zy)^^^). 

We now derive the null distribution for U , using the results above. 
As an intermediate step, define the discretized theoretical residual 



AVt^^=AEt^-(3t^i9o)ASt,, Fq' = 0- 
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Subject to obvious regularity conditions, 

ti+l<t 

ti+i<t 

ti+i<t 

= -20 - 9o) £^i9o) d{V, S)t + Op(Ai). 

Also, under the conditions in Proposition 3 in Section 6 (cf. also the proof of 

Proposition 2 in the same section), [V'^,V\ = [V,V]T + Op{At' ) as At ^ 

[since {V^,V'^)t = {V,V)t + Op(Ai^^^), and {V^,V^)[^ {V'^,V)[^ {V,V)[]. 
Hence, under the conditions of Theorem 1 or 2, 

U = -2(Air'/\9-9o) J^^{9o)d{V,S)t 

+ (Air'^m[V,V]T-[Z,Z]T)} 
— bias^^ +Op(l), 

where bias^?^ has the same meaning as in Section 3. If the null hypothesis 
is satisfied, therefore, 

U ^ N{0, 1) X 2vJ^ ^{Oo) d{V, S)t - biasj?) 

in law stably. The variance and bias can be estimated from the data. This, 
then, provides the null distribution for U. 

Another approach is to use U to measure how close the parametric residual 
{V,V) is to the lower bound {Z,Z). To first order, 

(At)'/^U-^{V,V)T-{Z,Z)T 

= rmeo)-ptfd{s,s)t. 

Jo 

The behavior of [/ - (Aiy^/^{{V, V)t - {Z, Z)t) depends on the joint limit- 
ing distribution of {[V, V]t - {V, V)t) - {[Z, Z]t - {Z, Z)t) and (M)-^/^{9 - 
9q). The former can be provided by Proposition 1 in Section 2.8 (or Section 5 
of [32]), but further assumptions are needed to obtain the joint distribution. 
A study of this is beyond the scope of this paper. 
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4.2. Confidence bands. In addition to providing pointwise confidence in- 

(") 

tervals for {Z, Z)t , we can also construct joint confidence bands for tlie 

— ■ — (a) 

estimated quadratic variation {Z, Z) of residuals. This is possible because 

-(a) 

{Z, Z) converges as a process by Theorems 1 and 2. 
One proceeds as follows. As a process on [0, T], 

(At("V'^'((^)r - (^,^)t) ^biasS")+Lt. 

Under all estimation schemes in the linear combination class, we have, by 

{") 

Theorems 1 and 2 and subsequent results on {Z,Z)t , 



Lt = f^^2H'{u){Z,Z)'^dWu, 



/o 

where is a standard Brownian motion independent of the complete data 
filtration. Now condition on Tt- by the stable convergence, Lt is then a Gaus- 
sian process, with (L, L)t nonrandom. Use the change-of-time construction 
of Dambis [7] and Dubins and Schwarz [10] to obtain Lt = W*^ , where 
W* is a new Brownian motion conditional on J^t- It then follows that 

max Lt = max VF^*, 

0^*^'^ 0<t<2j^ H'iu)i(Z,zyj^du 

min Lt = min VFj*. 

0<t<T 0<t<2 r H'{u){{Z,zy^fdu 

(ri\ —1/2 (a) 

Now write = {AV ') {{Z, Z)t - (Z, Z)t) - bias^ ■ We have 

P{\Ln{t)\ < c, for all t G [0,r]) ^ P{\L{t)\ < c, for all t G [0,T]) 

= Pf min Wt* > — c, max Wt* < c | . 

\0<t<T 0<t<T ) 

Choose c = Cr such that 

p{ min Wt > -Cr, max Wf < Cr t] = 1 - a, 

\0<t<T 0<t<T J 

with T = 2 H' {u){{Z)'^)^ du. To find c,-, one can refer to Section 2.8 in [34] 
for the distributions of the running minimum and maximum of a Brownian 
motion, r itself can be estimated by using Proposition 2. This completes our 
construction of a global confidence band. 

4.3. The coefficient of determination B? . In analogy with standard lin- 
ear regression, one can define by 

(Z.Z), 
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This quantity would have been observed if the whole paths of the processes 
H and S had been available. If observations are on a grid, it is natural to 
use 



(a) 



Ri = l 



?2 



Under the assumptions of Section 2, the distribution of Rf can be found by 



.(^(n))~l/2 



+ Op(l) 



(")n,~1/2 



X ^— - - (1 - - 

biasl"^ ,^ , 
-7^4^ + Op(l), 

where biasl"^^ is the bias corresponding to the estimator (Z, Z)^ . 

A straightforward generalization of Proposition 1 yields that (At ^"'^)~-'^/^ x 
([Z, Z\t - {Z, Z)t, [S, H]t - (S, H)t)o<t<r converges (stably) to a process with 
(bivariate) quadratic variation J^Qudu, where 

, _^rr>(..fi{Z,Z)[n{Z,Ey,r 

and equation (1.1) yields that {Z,E)[ = {Z,Z)[. It follows that 
(At(-V^/^(i??-i??) 

C.stable Rl /•* l2H'{u){Z,Zy,,dWu 



{E,E)t Jo 

+ / pH'{umE,Ey,Y-{{z,zy^Y]dw: 



where W and VF* are independent Brownian motions. For fixed t, the limit 
is conditionally normal, with mean — bias^*^^ /{E,E)t and variance 

■jJ^Rf f2H'{u){{Z,ZySdu 
{^,^}t Jo 
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+ 7^(1 - R'tf f 2H'{u) m EYS - m Z)'S] du, 
Jo 

which can be readily estimated using the device discussed in Section 3.4. 

4.4. Variance versus variation: Which ANOVA? The formulation of (1.6) 
is in terms of quadratic variation. This raises the question of how our analy- 
sis relates to the traditional meaning of ANOVA, namely, a decomposition of 
variance. There are several answers to this, some concerning the broad set- 
ting provided by model (1.1), and they are discussed presently. More specific 
structure is provided by financial applications, and a discussion is provided 
in Section 4.5. 

In model (1.1), the variation in Z can come from both the drift and 
martingale components. As in (2.2), 

(4.1) Zt = Zo + Z^^ + Zf^'^. 

Our analysis concerns most directly the variation in Z]^*^, in that var(Zj^^) = 
E{{Z,Z)t), where it should be noted that {Z,Z)t = {Z^^,Z^^)t. Hence, if 
the term is identically zero, the analysis of variation is an exact analy- 
sis of variance, in terms of expectations. The quadratic variation, however, 
is also a more relevant measure of variation for the data that were actually 
collected. The Dambis [7] and Dubins and Schwarz [10] representation yields 
that Z^*^ = V(z,z)t^ where y is a standard Brownian motion on a different 
time scale. Therefore, {Z,Z)t = {Z^^ , Z^'^)t contains information about 
the actual amount of variation that has occurred in the process Z^'^. Using 
the quadratic variation is, in this sense, analogous to using observed infor- 
mation in a likelihood setting (see, e.g., [12]). The analogy is valid also on 
the technical level: if one forms the dual likelihood ([36]) from score function 
Z]^^, the observed information is, indeed, {Z,Z)t. 

If the drift in (4.1) is nonzero, the analysis applies directly only 

to Z^^. So long as T is small or moderate, however, the variability in 
is the main part of the variability in Z. Specifically, both when T — > 
and T +00, Zf^ = Op{T^I'^) and ZJ^^ = Op{T). Thus, the bias due to 
analyzing Z^'^ in lieu of Zt becomes a problem only for large T. At the same 
time the present methods provide estimates for the variation in Z^'^ for 
small and moderate T, whereas the variation in Z^^ can only be consistently 
estimated when T — > -|-oo (by Girsanov's theorem). Thus, we recommend 
our current methods for moderate T, while one should use other approaches 
when dealing with a time span T that is long. 

4.5. Financial applications: An instance where variance and variation re- 
late exactly. It is quite common in finance to encounter the case from Sec- 
tion 4.4, where Z itself is a martingale, or where one is interested in Z^^ 



ANOVA FOR DIFFUSIONS 



21 



only. We here show a conceptual example of this, where one wants to test 
whether the residual Z is zero, or study the distribution of the residual under 
the so-called Risk Neutral or Equivalent Martingale Measure P* . 

If P is the true, actual (physical) probability distribution under which 
data is collected, P* is, by contrast, a probability measure equivalent to 
P in the sense of mutual absolute continuity, and it satisfies the condition 
that the discounted value of all traded securities must be P* -martingales. 
The values of financial assets, consequently, are expectations under P* . For 
further details, refer to [8, 9, 11, 24, 25, 28]. 

If the residual Z relates to the value of a security, one is often interested 
in its behavior under P* , rather than under P. Specifically, we shall see 
that one is interested in z^'^*, where this is the martingale part in the 
Doob-Meyer decomposition (4.1), when taken under P*: 

(4.2) Zt = Zo + Z^^* + Z^^* w.r.t. P* . 

The quadratic variation {Z,Z) = {Z^'^,Z^'^) = (^mg*^^mg*^ ^^^^ 
under P and P* , but, under the latter distribution it refers to the behavior 
of Z^'^* rather than Z^'^. 

A simple example follows from the motivating application in the Introduction. 
Suppose that H and S are both discounted securities prices, and that one 
seeks to offset risk in H by holding p units of S. The residual is then, itself, 
the discounted value on the unhedged part of H. Under P* , therefore, Z is a 
martingale, Zt = Zq + Z^'^* . A deeper example is encountered in [44], where 
we analyze implied volatilities. In both these cases, in order to put a value 
on the risk involved interested in bounds on the quadratic 

variation {Z,Z), under P*. This will help, for example, in pricing spread 
options on Z. 

How do our results for probability P relate to P*? They simply carry 
over, unchanged, to this probability distribution. Theorems 1 and 2 remain 
valid by absolute continuity of P* under P. In the case of limiting results, 
such as those in Propositions 1 and 3 (in Section 6) and the development for 
goodness of fit in Section 4, we also invoke the mode of stable convergence 
(Section 2.8) together with the fact that dP* /dP is measurable with respect 
to the underlying cj-field J-t- 

Finally, if one wants to test a null hypothesis Hq that Z^'^* is constant, 
then Hq is equivalent to asking whether {Z,Z)x is zero (whether under P 
or P*). This can again be answered with our distributional results above. In 
the case of the example in this section, the Hq of fully offsetting the risk in 
S also tests whether Z itself is constant. 

5. Conclusion. This paper provides a methodology to analyze the as- 
sociation between Ito processes. Under the framework of nonparametric, 
one-factor regression, we obtain the distributions of estimators of residual 
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variation {Z,Z). We then use this in a variety of measures of goodness of 
fit. We also show how the method yields a procedure to test the appropri- 
ateness of a parametric regression model. The limiting distributions identify 
two sources of uncertainty, one from the discrete nature of the data process, 
the other from the estimation procedure. Interestingly, among the class of 

-(a) 

estimators {Z,Z) under consideration, qG [0,1], discrete-time sampling 
only impacts the "variance" component. On the other hand, different esti- 
mation schemes lead to different biases in the asymptotics. 

ANOVA for diffusions permits inference over a time interval. This is be- 

— — ~~(q) 

cause the error terms in the quadratic variation {Z, Z) of residuals and, 
hence, the error terms in the goodness of fit measures, converge as a process, 
whereas the errors in the estimated regression parameters pt are asymptot- 
ically independent from one time point to the next. This feature of time 
aggregation makes ANOVA a natural procedure to determine the adequacy 
of an adopted model. Also, the ANOVA is better posed in that the rate of 
the convergence is the square of the rate for pt — Pt- 

The "ANOVA for diffusions" approach is appealing also from the position 
of applications. As long as one can collect the data as a process, one can rely 
on the proposed ANOVA methodology to draw inference without imposing 
parametric structure on the underlying process. In financial applications, as 
in Section 4.5, it can test whether a financial derivative can be fully hedged 
in another asset. In the event of nonperfect hedging. Theorems 1 and 2 tell 
us how to quantify the amount of hedging error, as well as its distribution. 

6. Convergence in law — Proofs and further results. In the following we 
deal with processes that are exemplified by [Z,Z] — {Z,Z). We mostly fol- 
low [32]. 

Proof of Proposition 1. The apphcable parts of the proof of the 
cited Theorems 5.1 and 5.5 of [32] carry over directly under Assumptions 
A(i) and A(ii). When modifying the proofs, as appropriate, = max(t-"^ < 
t) replaces [tn]/n, 6n replaces and so on. For example, the right-hand 
side of their equation (5.10) on page 290 becomes Kd'^. The main change 
due to the nonequidistant case occurs in part (iii) of Jacod and Proffer's 
Lemma 5.3, pages 291 and 292, where in the definition of a„, ^B^!' should be 
replaced by {H{tr + t) — H{tr))B'^J' . Assumptions A(i) and A(ii) are clearly 
sufficient. □ 



Note that the result extends in an obvious fashion to the case of multidi- 
mensional Z = (Z(^), . . . , Also, instead of studying [Z, Z] - (Z, Z), one 

can, like [32], state the (more general) result for Jq{Zu^ — zi^^)dZu\ 
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In the sequel we shall also be using a triangular array form of Proposition 
1; see the ends of the proofs of Theorems 1 and 2. 

Proposition 3 (Triangular array version of the discretization theorem). 
Let Z be a vector ltd process for which \\ {Z, Zy^W^ dt < oo and f^ WZtW^ dt < 
oo a.s. Also, suppose that i = 1,2,..., are ltd processes satisfying the 

same requirement, uniformly. Suppose that the (vector) Brownian motion W 
is the same in the ltd process representations of Z and of all the Z^'^\ that 
is, 

(6.1) dZ^^^^^^ = a^if'Uw and dZf^ = audW. 
Suppose that 

(6.2) / ||4")-cT„f dt/ = Op(l). 

JO 

Then, subject to Assumptions A(i) and (ii), the processes , Joi^u'"^^ — 

V At 

zi*'"^) dZu''^^ converge jointly with the processes , ^ , , Jn{Zu^ — Z^S'^dZu^ 
to the same limit. 



If one requires stable convergence, one just imposes System Assump- 
tion III; see Theorem 11.2, page 338, and Theorem 15.2(c), page 496, of [30]. 

Proof of Proposition 3. This is mainly a modification of the devel- 
opment on page 292 and the beginning of page 293 in [32], and the further 
development in (their) Theorem 5.5 is straightforward. Again we recollect 
that H [from Assumptions A(i) and (ii)] is Lipschitz continuous. 

Note that to match the end of the proof of Theorem 5.1, we really need 
Op((5^). This, of course, follows by appropriate use of subsequences. □ 



Finally, we show the result for Section 3.4. 



Proof of Proposition 2. Let be the largest grid point smaller than 
or equal to t. Then by Ito's lemma, d{Zt — Zt^)^ = ^{Zt — -^t, )^ dZt + G{Zt — 
Zt^fd{Z,Z)t.Rence, 



[Z, Z, Z,Z]t= J2 4 / {Zu -Ztf dZu 
(6.3) +Q f^*\Zu-Ztfd{Z,Z)u 



6 J2 r^\Zu-Ztfd{Z,Z)t + Op{5(^^'^^ 



ti-\-l<t 
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using Lemma 2 below. 

Define an interpolated version of [Z, Z\ = [Z, Z]^ by letting [Z, = 
{Zf — Zt^)'^ + [Z, Z]t^. Then, again by Ito's lemma, 

J2 t^\Zu-Ztfd{Z,Z)t 

(6.4) 

= z]-*-p°i - (z, z)), ([z, z]-*-p°' - (Z, Z))),^ . 

Putting together (6.3) and (6.4), and Proposition 1, as well as Corollary 
VI.6.30, page 385 of [33], we obtain (3.8). 

Replacing Z by Z, and creating interpolated versions of Z and [Z, Z] as 
at the beginning of the proof of Theorem 1 below, (3.7) also follows. □ 



7. Proofs of main results. 



7.1. Notation. In the following proofs ti always means t\ , h always 
means At means At^"^ and so on. Also, t^, is the largest grid point less 
than t, that is, t,, = maxji^"^ < t}. We sometimes write {X,X)t as {X)t, and 
{X,X)[ as {X)[ for simplicity. Also, for convenience, we adopt the following 
shorthand for smoothness assumptions for Ito processes: 



Assumption B (Smoothness). 

B.O(X): X is in C^[0,T]. 

B.1{X,Y): {X,Y)t is in C^[0,T]. 

B.2{X): the drift part of X (X^R) is in C^[0,T]. 



Assumption B(X) is equivalent to B.1{X,X) and B.2{X). 
We shall also be using the following notation: 

T^ih)=snp\Xu-X,\ and T^^ (h) = sup \{X,Yy^ - {X, 

u,s u,s 

where sup„^, means sup„^,g[o_T]>-.|<h- 

7.2. Preliminary lemmas and proof of Theorem 1. 

Lemma 1. Let Mi^n{t), <t<T, i = I, . . . ,kn, kn = 0{At '^), be a col- 
lection of continuous local martingales. Suppose that supi<j</-^(Mj_„, Mj^„)r = 

Op(At^). Then, for any e > 0, supi<j<fc^ supo<t<T \Mi^n{t)\ = Op{At'^^^~^). 

The above follows from Lenglart's inequality. As a corollary, the following 
is true. 
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Lemma 2. Let X and Y be ltd processes. If X satisfies that \X\ and 



{X,X)' are bounded a.s., then for any e > 0, {rf) = Op((r/ + At)^^^ 
Similarly, if X andY satisfy that D-^^ and {R'^^ , R'^^) are bounded (a.s.), 
then for anye>0, T^^ {i]) = Op{{i] + Aif^^~'). 



Lemma 3. Suppose X , Y and Z are ltd processes, and make Assump- 
tions A{i), B.O(y), B[(X),(Z)] and B.1(X, Z). Also, assume that for any 
e>0, asn^oo, (At)i/2-^/i-V2 = o(l). Then 

t~h<ti<tij^-i<t 

as oo, where Y^ = dYu/du. 



{Xu - Xt^){Z^ - Zu) dY^ - ^-^{X, Z)[Yu 



U 



Proof of Lemma 3. Without loss of generality, it is enough to show 
the result for X = Z. This is because one can prove the results for X, Z 
and X + Z, and then proceed via the polarization identity. The conditions 
imposed also mean that the assumptions of Lemma 3 are also satisfied for 
X + Z. Let 

1 



It 



lit 



1 



E 



'^^\{X)u - {X)u)dY^ - ]^{X)[Yt^{^t,Y 



/l2 E 

t~h<ti<t^+i<t^^ 



[{Xu-Xuf-{{X)u-{X)u)]dYu. 



Obviously sup^ \It\ = 0p{Ath ^). It is then enough to show 



as follows. 

By Ito's lemma, 

2 



sup I //j I = Op{Ath 
t 



iit=^ E nx^-xtjdx^'^dYu 



t—h<ti<ti^i<-t 



2 ^ 



ti+i 



t-h<ti<ti+i<t^' 



{x,-Xt,)dx:; 



mg 



dYu. 



lit 
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Recall that dX^^ = Xydv. Then by B.O(y), B.2(X) and the continuity of 
X, one gets 

< sup sup I^^uIt^ > / / 



t— /l<ti<ti+l<t 

For //t^2) using integration by parts, 
1 /"^i+i r 



MG 



du 



(7.1) 



t-h<ti<tij^-i<t 

Equation (7.1) has quadratic variation bounded by 

1 x-^ f^^+l 



T4SUP j^*\Xu-Xtf{ti+i-ufd{X)u 
(7.2) <-Jjsup(X);^sup P^\T''{6^^^))\u+,-ufdu 



= Op{At^ ^/i"3) 

by Lemma 2 under Assumptions A(i) and B(X). Following Lemma 1 and 
B.O(y), supj \IIt,2\ = Opi'At^^'^^'h^^/^). The result then follows. 

Definition 5. Suppose X and Y are continuous Ito processes. Let 



(7.3) B^^, = { I 



I X 

XY_}- {{U-h)-u)d{X,Yy^, t>ti-h, 
' ■ — ' h Jti-h 

0, otherwise 



[21 ^-^ 

^ / {Xs-Xt^)dYs, t>ti-h, 



and 

0, otherwise, 
where [2] indicates symmetric representation s.t. [2] J X dY = J X dY + jYdX. 

Note that by integration by parts via Ito's lemma, B^^ti ~ Ti{{-^^^)u ~ 
(X,y)t,_,) - {X,Y)[^ and, hence, l^yi^ - {X,Y)[^= B^l,^+ B^^^. 
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Lemma 4. Under Assumptions A(i) and Ji(X,Y, {X,Y)' ) and the order 
selection of h? = 0{At), for any e > 0, 

sup \B^Z.\ = Op(^t'^'~") and sup \B^l,^\ = Op(M'^ 

0<U<T 0<ti<T 

In particular, 

sup \{XY)[^-{X,Yy,J=Op(Ai'/^~'). 

0<ti<T 

In addition, under condition B{{R^^ ,R^^)'), 



(7.5) sup 



U 



(7.6) ^uv\{Bir .BlJ)^-Gt\=Op{/\t'''), 



where Gt = ^ Et-h<u<u^,<ti{^^ ^Yu ^Yu + ^Yu Z)[JiAt,f and 

(7.7) sup I (5,7, ^2^), I = Op (^). 

Proof of Lemma 4. Using Definition 2, write 

(7.8) B^Z = \f iiti-h)- u) dR^^ + \f {{t, -h)-u) dD^"" . 



XY,MG fjA'y,DR 

Xy,DR| 



Under Assumption B.2{{X,Y)'), sup^lB^ ■ I, \ =Op{h). Also, by Assump- 
tions A(i) and B.1{R^^ , R^^), we have that supo<„<T = Op(l), 
whence sup,{Bf/ , B^/)t = Op((At^"V/^)- So sup^supj^™^! = 
Op{At^^^ ^) by Lemma 1 and, hence, supj | -6^'''^^ | = Op(At^^^ ^). By sim- 
ilar methods. Lemmas 1 and 2 can be used to show that supj|i?2?'l — 

The orders (7.5)-(7.7) follow from the representation (7.8), and deriva- 
tions similar to those of the proof of Lemma 3. □ 

The following is now immediate from Lemma 4, by Taylor expansion. 

Corollary 1. Suppose H, S and p are ltd processes, where p and p are 
as defined in Section 2. Then, under conditions A(i), B(H, 5, (H, 5)', (5, S)') 
and (2.7), for any e > 0, we have 

sup \pt^ -Pt,\= Op{At^^^~''). 

Ue[o,T] 
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In the rest of this subsection we shall set, by convention, pt to the value 
pu, even if the definition in Section 2.4 permits other values of pt between 
sampling points. This is no contradiction since we only use the p^. in our 
definition of Z and in the rest of our development. 

Similarly, we extend the definition of Zt given at the beginning of Sec- 
tion 2.5 to the case where t is not a sampling point. Specifically, 



Zt — Zt, + ^t — ^t* — Pt., {St — St, ) 

= Zt, + — — I Pu dSu 
Jt* 

by our convention for pt. We emphasize that Zt is no more observable than 
St or H(. By this definition of Z, in view of (1.6) and (1.5), 

(7.9) {Z, Z)t = {Z, Z)t + f{pu - Puf d{S, S)u. 

Jo 

We then obtain the following preliminary result for Theorem 1. 

Proposition 4. Under the conditions of Theorem 1, {{Z, Z)t - {Z, Z)t) / 
^TKt = Dt + Op{l), uniformly in t, where Dt is given by (3.2). 

Proof, (i) Let Li^nit) = Yl'j=iLj,i,n{t), where for j = 1,2, Lj^i.a{t) = 

i^ftAti - Pti-hBf^tAtJ/ {S, S)[^_h for t>ti-h, and zero otherwise. It is now 
easily seen from Lemmas 2 and 4, and Corollary 1, and by the same methods 
as in these results, that Lj^„(t) approximates pt — pt and, in particular, that 



(7.10) j\pu-Pu?d{S,S)u= E ^?,„(t.)(5,5);^_,At, + 0p(At' 



.-T— 3/4-e< 

^i,n\H)\J,>='lt,-h'^H ^ '-'p 

uniformly in t, for any e > 0. We now show that 
Ll^{t,){S,S)[^_j,At, 

U+i<t 

(7.11) _ 

= E (^.,n),^(S,5);^„,At, + 0p(At'/' 
U+i<t 

To this end, set Y^'\t) = Ll^{t) - {Li^n)t and 

(7.12) Z„.,t= E Y('\ti){S,Syt^_^AU + Y^'Ht,){S,S)l^,At*. 



By Lenglart's inequality ([33], Lemma 1.3.30, page 35), (7.11) follows if one 

can show that {Z 
in the rest of (i). 



can show that {Zn,Zn)rp = Op{At' ), for any e > 0. This is what we do 
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Since Lj^n)t = if (tj — h, ti] and (tj — h, tj] are disjoint for any ti<T 

and tj < T, the quadratic variation of the Z„ is considered in the overlapping 
time interval: 

{Zn, Zn)T 



< sup {{S,S) 

0<t<T 



■ ■ J{u-h,u]n{tj-h,tj] 
(7.13) <2 snpi{S,Sy,fY.i^U){At,)I^,^^.,^^,J[ 

0<t<T \J{tj-h,U] J 



1/2 



The above line follows from the Kunita-Watanabe inequality (page 69 of [38]). 

By Ito's formula, ior U - h < t < U, {Y^'^)[ = 'iLl^{t){Li^nYf Hence, by 
Lemma 4, (y^'^'>)[ < Ui{Li^n)'ti where Ui is defined independently of i, and 

Ui = Op(At^^^ Also, by the Kunita-Watanabe inequality and the Cauchy- 
Schwarz inequality, 

(7-14) {U,n)[ < 4 sup TT^^Vl^ E((i?|f ); + pl^H{Bf,f)[), 

0<u<T [{J, >J)u) j^i 

where BJ-^'^, j = 1,2, is defined before Lemma 4. Thus, it is enough that 

(7.13) is Op(At^^^"^) in the two cases where {Y^^^)'f- is replaced by (a) 
Ui{B^/)[, for {X,Y) = {E,S) and {S,S), and (b) Ui{B^/)[, for {X,Y) = 

(H, 5), {S, H) and (5, 5). We do this in the case of (B^^^)'^. The second case 
is similar. 

Obviously, on U - h<t<ti, {Bf/)[ = ^{t-{ti- h)f{R^^)'t. Also, set 
Nn = sup( #{j -.t < tj < t + h] and 5^^^ = vain (4"\ — t^"^), and note that 
Nn = Oih/d^L^) = 0{h/~M) under Assumption A. Since supo<„<r {R^^)'u < 
oo, the part of equation (7.13) which is attributable to the B^''^ term be- 
comes, up to an Op{l) factor, 

(7.15) [/i^ J2 [h'-{t,-t,ff'[t,-it,-h)f' = 0,(Ai'/'~'). 

i<j:tj—ti<h 

(ii) To finish the proof of the proposition, we wish to show that, as 
(At//i)i/2 ^ c, 

(7.16) At"'/' J2 {U,n)u{S:S)[^^^At,^Dt 

U+i<t 
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in probability, for each t. (This is enough, since the convergence of increas- 
ing functions to an increasing function is automaticahy uniform.) Since 

supj I (Li^j^„, L2,i,n)ti I = Op{At'^^^) from Lemma 4, it follows that it is suf- 
ficient to prove separately that 

^-1/2 

P 1 



(7.17) 



{p,py^d{s, s)u 



and 



(7.18) 



3c Jo 



ti + l<t 



c 



f-pi H'{u)d{S,S)^. 



lo \{s,syu 

Equation (7.17) follows directly from the approximation (7.5) in Lemma 4. 
It remains to show (7.18). This is what we do for the rest of the proof. 
Let A be an Ito process, which we shall variously take to be js^^ — (s,s)' 

2 

and ^g^gy . Consider a subproblem of (7.18), that of the convergence of 

-1/2 

By (7.6) in Lemma 4, this is (uniformly in t) equal to 



(7.19) At-'^' J2 {B^/,Bir),A.-hAU. 



At E f{tj){At,fAt^_hAU + Op{l), 

ii + l l£t ti — h<tj <tj + l <ti 

where f{t) = {X, Zy^iY, Vy^ + {X, Vyt{Y, Zy^. By interchanging the two sum- 
mations, (7.19) then becomes, up to Op(l), 

tj+2<t 



This results because the difference between the last two terms is bounded 
by 

Ai-'^\-' J2 \f{t,)\{AtjfT^{h)<sup\f{t)\T''{h)Ai'^\-'Hn{t) 
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by Lemma 2. Hence, (7.19) converges to c Jq f{u)AudH{u) = c Jq f{u)Au x 
H'{u) du by Assumption A and since A and / are bounded and continuous. 
Note that H is absolutely continuous since it is Lipschitz. The result (7.18) 
now follows by aggregating this convergence for the cases of {B2i , B2i) 

{A = j^), (i?f|,i?|f) {A = -ji^) and (i?|f,i?|f> {A = j^). □ ' 

Proof of Theorem 1. In view of Proposition 4, it is enough to show 
that 

(7.20) sup\[Z,Z]t - {Z,Z)t - {[Z,Z]t - {Z,Z)t)\=Op(At-'^^). 

t 

From (7.9), {Z, Z)[ - {Z, Z)[ = {pt - Ptf{S, S)', and similarly for the drift of 
Z and Z. The result then follows from Proposition 3 and Corollary 1. □ 

7.3. Additional lemmas for Theorem 2, and proof of the theorem. By the 
same methods as above, we obtain the following two lemmas, where the first 
is the key step in the second. 

Lemma 5. Let X , Y and A he ltd processes. Let h = 0{At ). Assume 
Assumptions A, B.1[(A:, X), {A, A), {Y, Y), {A, X), {A, Y)] and B.2[(A:), (A), (Y)] . 
Define 

y,^^= {AXt^){AYtJ + {Xt-Xt,){Yt-Yt,), 

U+i<t 

(7.21) Uit) = \fY. AuIiuM+Mu-h,u])^Vt, du 

n JO ■ 

-It. AAVuih-At,). 

Then supt\U{t)\ = Op(Ki^^^). 

Lemma 6. Let H, S, p and Z be the ltd processes defined in earlier 
sections. Subject to the regularity conditions in Lemma 5 with {X,Y) = 
(H, S), {S, S), or {p, S), and with A = p, p^ or Z , 

f {pu- Pu)d{E,S)u = [^,Z]t-[Z,Z]t 
Jo 

-o/ Pud{p,{S,S)')^ + Op{h), 
3 Jo 

uniformly in t. 



32 



P. A. MYKLAND AND L. ZHANG 



Proof of Theorem 2. By definition, A(Z, = A[H, . Since 
d{E,Z)t = d{Z,Z)t by assumption, and by subtracting and adding {'E, Z)t, 

M~"\(z^)t-{Z,Z)t) 

= At"'/'([H, Z]t - (H, Z)t) + At"'/'((H, Z)t - {E, Z)t) . 
Ci C2 

First notice that 

(7.22) {E,Z)[ = {E,Z)[ + {pt-pt){E,Sy,. 

Also, as At /h^ c, (7.22) and Lemma 6 show that 
1 /■* 

C2 = , / {pu - Pu) d{E, S)u 

[Z,Z]t-[E,Z]t , 1 /•* 



1 /-t ^ 
+ ir i Pu d{p, {S, S)')^ + Op(l) uniformly in t. 
6c Jo 



Since {E,Z)t = {Z,Z)t, it follows that 

Ai'/\{Zy)t-{Z,Z)t) 

= Ai'^\[E, Z]t - {E, Z)t + [Z, Z]t - [E, Z]t) 

+ ^J\ud{p,{s,sy)^ + o,{i) 

= M^'\[Z,Z]t - {Z,Z)t) + ^ j%ud{p, {S,Sy)^ 

+ At'/'([H, Z-Z]t- {E, Z - Z)t) + Op(l). 

The last component in the above. At ([H, Z — Z]^ — (E, Z — Z)t), goes to 
zero in probability by Proposition 3, since {E,E)'^, {Z — Z,Z — Z)'^, and 
{E, Z — Z)'^ satisfy the conditions of this proposition. This is in view of 
Corollary 1. The argument is similar to that at the end of the proof of 
Theorem 1. □ 
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